variable elimination
- North America > United States > Massachusetts > Hampshire County > Amherst (0.04)
- Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.05)
- North America > Canada > Quebec > Montreal (0.04)
A Saddle Point Remedy: Power of Variable Elimination in Non-convex Optimization
Gan, Min, Chen, Guang-Yong, Yi, Yang, Yang, Lin
The proliferation of saddle points, rather than poor local minima, is increasingly understood to be a primary obstacle in large-scale non-convex optimization for machine learning. Variable elimination algorithms, like Variable Projection (VarPro), have long been observed to exhibit superior convergence and robustness in practice, yet a principled understanding of why they so effectively navigate these complex energy landscapes has remained elusive. In this work, we provide a rigorous geometric explanation by comparing the optimization landscapes of the original and reduced formulations. Through a rigorous analysis based on Hessian inertia and the Schur complement, we prove that variable elimination fundamentally reshapes the critical point structure of the objective function, revealing that local maxima in the reduced landscape are created from, and correspond directly to, saddle points in the original formulation. Our findings are illustrated on the canonical problem of non-convex matrix factorization, visualized directly on two-parameter neural networks, and finally validated in training deep Residual Networks, where our approach yields dramatic improvements in stability and convergence to superior minima. This work goes beyond explaining an existing method; it establishes landscape simplification via saddle point transformation as a powerful principle that can guide the design of a new generation of more robust and efficient optimization algorithms.
- Asia > China > Shandong Province > Qingdao (0.04)
- Asia > China > Jiangsu Province (0.04)
- Asia > China > Fujian Province > Fuzhou (0.04)
- (6 more...)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)
Reviews: Beyond normality: Learning sparse probabilistic graphical models in the non-Gaussian setting
This paper presents a method for identifying the independence structure of undirected probabilistic graphical models with continuous but non-Gaussian distributions, using SING, a novel iterative algorithm based on transport maps. The authors derive an estimate for the number of samples needed to recover the exact underlying graph structure with some probability and demonstrate empirically that SING can indeed recover this structure on two simple domains, where comparable methods that make invalid assumptions about the underlying data-generating process fail. The paper seems technically sound, with its claims and conclusions supported by existing and provided theoretical and empirical results. The authors mostly do a good job of justifying their approach, but do not discuss potential issues with the algorithm. For example, what is the complexity of SING?
Lifting Factor Graphs with Some Unknown Factors
Luttermann, Malte, Möller, Ralf, Gehrke, Marcel
Lifting exploits symmetries in probabilistic graphical models by using a representative for indistinguishable objects, allowing to carry out query answering more efficiently while maintaining exact answers. In this paper, we investigate how lifting enables us to perform probabilistic inference for factor graphs containing factors whose potentials are unknown. We introduce the Lifting Factor Graphs with Some Unknown Factors (LIFAGU) algorithm to identify symmetric subgraphs in a factor graph containing unknown factors, thereby enabling the transfer of known potentials to unknown potentials to ensure a well-defined semantics and allow for (lifted) probabilistic inference.
First-Order Decomposition Trees
Exact lifted inference methods, like their propositional counterparts, work by recursively decomposing the model and the problem. In the propositional case, there exist formal structures, such as decomposition trees (dtrees), that represent such a decomposition and allow us to determine the complexity of inference a priori. However, there is currently no equivalent structure nor analogous complexity results for lifted inference. In this paper, we introduce FO-dtrees, which upgrade propositional dtrees to the first-order level. We show how these trees can characterize a lifted inference solution for a probabilistic logical model (in terms of a sequence of lifted operations), and make a theoretical analysis of the complexity of lifted inference in terms of the novel notion of lifted width for the tree.
Probabilistic Inference with Generating Functions for Poisson Latent Variable Models
Graphical models with latent count variables arise in a number of fields. Standard exact inference techniques such as variable elimination and belief propagation do not apply to these models because the latent variables have countably infinite support. As a result, approximations such as truncation or MCMC are employed. We present the first exact inference algorithms for a class of models with latent count variables by developing a novel representation of countably infinite factors as probability generating functions, and then performing variable elimination with generating functions. Our approach is exact, runs in pseudo-polynomial time, and is much faster than existing approximate techniques. It leads to better parameter estimates for problems in population ecology by avoiding error introduced by approximate likelihood computations.
- North America > United States > Massachusetts > Hampshire County > Amherst (0.04)
- Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
Locally Optimal Estimation and Control of Cable Driven Parallel Robots using Time Varying Linear Quadratic Gaussian Control
Chen, Gerry, Hutchinson, Seth, Dellaert, Frank
We present a locally optimal tracking controller for Cable Driven Parallel Robot (CDPR) control based on a time-varying Linear Quadratic Gaussian (TV-LQG) controller. In contrast to many methods which use fixed feedback gains, our time-varying controller computes the optimal gains depending on the location in the workspace and the future trajectory. Meanwhile, we rely heavily on offline computation to reduce the burden of online implementation and feasibility checking. Following the growing popularity of probabilistic graphical models for optimal control, we use factor graphs as a tool to formulate our controller for their efficiency, intuitiveness, and modularity. The topology of a factor graph encodes the relevant structural properties of equations in a way that facilitates insight and efficient computation using sparse linear algebra solvers. We first use factor graph optimization to compute a nominal trajectory, then linearize the graph and apply variable elimination to compute the locally optimal, time varying linear feedback gains. Next, we leverage the factor graph formulation to compute the locally optimal, time-varying Kalman Filter gains, and finally combine the locally optimal linear control and estimation laws to form a TV-LQG controller. We compare the tracking accuracy of our TV-LQG controller to a state-of-the-art dual-space feed-forward controller on a 2.9m x 2.3m, 4-cable planar robot and demonstrate improved tracking accuracies of 0.8{\deg} and 11.6mm root mean square error in rotation and translation respectively.
- North America > United States > Georgia > Fulton County > Atlanta (0.04)
- Europe > France > Occitanie > Hérault > Montpellier (0.04)
Conditional independence by typing
Gorinova, Maria I., Gordon, Andrew D., Sutton, Charles, Vakar, Matthijs
A central goal of probabilistic programming languages (PPLs) is to separate modelling from inference. However, this goal is hard to achieve in practice. Users are often forced to re-write their models in order to improve efficiency of inference or meet restrictions imposed by the PPL. Conditional independence (CI) relationships among parameters are a crucial aspect of probabilistic models that captures a qualitative summary of the specified model and can facilitate more efficient inference. We present an information flow type system for probabilistic programming that captures conditional independence (CI) relationships, and show that, for a well-typed program in our system, the distribution it implements is guaranteed to have certain CI-relationships. Further, by using type inference, we can statically \emph{deduce} which CI-properties are present in a specified model. As a practical application, we consider the problem of how to perform inference on models with mixed discrete and continuous parameters. Inference on such models is challenging in many existing PPLs, but can be improved through a workaround, where the discrete parameters are used \textit{implicitly}, at the expense of manual model re-writing. We present a source-to-source semantics-preserving transformation, which uses our CI-type system to automate this workaround by eliminating the discrete parameters from a probabilistic program. The resulting program can be seen as a hybrid inference algorithm on the original program, where continuous parameters can be drawn using efficient gradient-based inference methods, while the discrete parameters are drawn using variable elimination. We implement our CI-type system and its example application in SlicStan: a compositional variant of Stan.
- North America > United States > California > San Francisco County > San Francisco (0.14)
- North America > United States > Gulf of Mexico > Central GOM (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- (3 more...)
- Research Report (0.50)
- Instructional Material (0.46)